智能论文笔记

Clustered Federated Learning based on Nonconvex Pairwise Fusion

Xue Yu , Ziyi Liu , Yifan Sun , Wu Wang

分类：机器学习

2022-11-08

This study investigates clustered federated learning (FL), one of the formulations of FL with non-i.i.d. data, where the devices are partitioned into clusters and each cluster optimally fits its data with a localized model. We propose a novel clustered FL framework, which applies a nonconvex penalty to pairwise differences of parameters. This framework can automatically identify clusters without a priori knowledge of the number of clusters and the set of devices in each cluster. To implement the proposed framework, we develop a novel clustered FL method called FPFC. Advancing from the standard ADMM, our method is implemented in parallel, updates only a subset of devices at each communication round, and allows each participating device to perform a variable amount of work. This greatly reduces the communication cost while simultaneously preserving privacy, making it practical for FL. We also propose a new warmup strategy for hyperparameter tuning under FL settings and consider the asynchronous variant of FPFC (asyncFPFC). Theoretically, we provide convergence guarantees of FPFC for general nonconvex losses and establish the statistical convergence rate under a linear model with squared loss. Our extensive experiments demonstrate the advantages of FPFC over existing methods.

translated by 谷歌翻译

Bootstrap Generalization Ability from Loss Landscape Perspective

Huanran Chen , Shitong Shao , Ziyi Wang , Zirui Shang , Jin Chen , Xiaofeng Ji , Xinxiao Wu

分类：计算机视觉

2022-09-18

域的概括旨在学习一个可以很好地概括在看不见的测试数据集（即分布数据集）上的模型，该数据与培训数据集不同。为了解决计算机视觉中的领域概括，我们将损失景观理论引入该领域。具体而言，我们从损失景观的角度从四个方面（包括骨干，正则化，训练范式和学习率）引起了深度学习模型的概括能力。我们通过进行广泛的消融研究和可视化来验证有关NICO ++，PAC和VLCS数据集的提议理论。此外，我们将该理论应用于ECCV 2022 NICO挑战1，并在不使用任何域不变方法的情况下获得第三名。

translated by 谷歌翻译

Research: Modeling Price Elasticity for Occupancy Prediction in Hotel Dynamic Pricing

Fanwei Zhu , Wendong Xiao , Yao Yu , Ziyi Wang , Zulong Chen , Quan Lu , Zemin Liu , Minghui Wu , Shenghua Ni

分类：机器学习

2022-08-04

需求估计在动态定价中起着重要的作用，在动态定价中，可以通过基于需求曲线最大化收入来获得最佳价格。在在线酒店预订平台中，房间的需求或占用率随着房间类型而变化，随着时间的推移变化，因此获得准确的占用估算是一项挑战。在本文中，我们提出了一种新颖的酒店需求功能，该功能明确地模拟了对占用预测需求需求的价格弹性，并设计了价格弹性预测模型，以了解各种影响因素的动态价格弹性系数。我们的模型由精心设计的弹性学习模块组成，以减轻内生性问题，并在多任务框架中接受培训以解决数据稀疏性。我们在现实世界数据集上进行了全面的实验，并验证方法优于最先进的基准，以实现占用预测和动态定价。

translated by 谷歌翻译

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Aarohi Srivastava , Abhinav Rastogi , Abhishek Rao , Abu Awal Md Shoeb , Abubakar Abid , Adam Fisch , Adam R. Brown , Adam Santoro , Aditya Gupta , Adrià Garriga-Alonso

分类：自然语言处理 | 人工智能 | 机器学习 | (统计)机器学习

2022-06-09

语言模型既展示了定量的改进，又展示了新的定性功能，随着规模的增加。尽管它们具有潜在的变革性影响，但这些新能力的特征却很差。为了为未来的研究提供信息，为破坏性的新模型能力做准备，并改善社会有害的效果，至关重要的是，我们必须了解目前和近乎未来的能力和语言模型的局限性。为了应对这一挑战，我们介绍了超越模仿游戏基准（Big Bench）。 Big Bench目前由204个任务组成，由132家机构的442位作者贡献。任务主题是多样的，从语言学，儿童发展，数学，常识性推理，生物学，物理学，社会偏见，软件开发等等。 Big-Bench专注于被认为超出当前语言模型的功能的任务。我们评估了OpenAI的GPT型号，Google内部密集变压器体系结构和大型基础上的开关稀疏变压器的行为，跨越了数百万到数十亿个参数。此外，一个人类专家评估者团队执行了所有任务，以提供强大的基准。研究结果包括：模型性能和校准都随规模改善，但绝对的术语（以及与评估者的性能相比）；在模型类中的性能非常相似，尽管带有稀疏性。逐渐和预测的任务通常涉及大量知识或记忆成分，而在临界规模上表现出“突破性”行为的任务通常涉及多个步骤或组成部分或脆性指标；社交偏见通常会随着含糊不清的环境而随着规模而增加，但这可以通过提示来改善。

translated by 谷歌翻译

Dynamics-aware Adversarial Attack of 3D Sparse Convolution Network

An Tao , Yueqi Duan , He Wang , Ziyi Wu , Pengliang Ji , Haowen Sun , Jie Zhou , Jiwen Lu

分类：计算机视觉

2021-12-17

在本文中，我们研究了深神经网络中的动态感知对抗攻击问题。大多数现有的对抗性攻击算法是在基本假设下设计的 - 网络架构在整个攻击过程中都是固定的。然而，这种假设不适用于许多最近提出的网络，例如最近提出的网络。 3D稀疏卷积网络，其中包含输入相关的执行，以提高计算效率。它导致严重问题的滞后梯度，由于架构之后的架构而导致当前步骤的学习攻击无效。为了解决这个问题，我们提出了一种带有铅梯度法（LGM）并显示出滞后梯度的显着影响。更具体地说，我们重新制定了梯度，以了解网络架构的潜在动态变化，使得学习攻击更好地“引导”的下一步，而是当网络架构动态变化时的动态 - 不知道方法。关于各种数据集的广泛实验表明，我们的LGM在语义细分和分类上实现了令人印象深刻的性能。与动态无知的方法相比，LGM在SCANNET和S3DIS数据集上均达到约20％的MIOU。 LGM还优于最近的点云攻击。

translated by 谷歌翻译

Multi-Stage Spatio-Temporal Aggregation Transformer for Video Person Re-identification

Ziyi Tang , Ruimao Zhang , Zhanglin Peng , Jinrui Chen , Liang Lin

分类：计算机视觉

2023-01-02

In recent years, the Transformer architecture has shown its superiority in the video-based person re-identification task. Inspired by video representation learning, these methods mainly focus on designing modules to extract informative spatial and temporal features. However, they are still limited in extracting local attributes and global identity information, which are critical for the person re-identification task. In this paper, we propose a novel Multi-Stage Spatial-Temporal Aggregation Transformer (MSTAT) with two novel designed proxy embedding modules to address the above issue. Specifically, MSTAT consists of three stages to encode the attribute-associated, the identity-associated, and the attribute-identity-associated information from the video clips, respectively, achieving the holistic perception of the input person. We combine the outputs of all the stages for the final identification. In practice, to save the computational cost, the Spatial-Temporal Aggregation (STA) modules are first adopted in each stage to conduct the self-attention operations along the spatial and temporal dimensions separately. We further introduce the Attribute-Aware and Identity-Aware Proxy embedding modules (AAP and IAP) to extract the informative and discriminative feature representations at different stages. All of them are realized by employing newly designed self-attention operations with specific meanings. Moreover, temporal patch shuffling is also introduced to further improve the robustness of the model. Extensive experimental results demonstrate the effectiveness of the proposed modules in extracting the informative and discriminative information from the videos, and illustrate the MSTAT can achieve state-of-the-art accuracies on various standard benchmarks.

translated by 谷歌翻译

APOLLO: A Simple Approach for Adaptive Pretraining of Language Models for Logical Reasoning

Soumya Sanyal , Yichong Xu , Shuohang Wang , Ziyi Yang , Reid Pryzant , Wenhao Yu , Chenguang Zhu , Xiang Ren

分类：自然语言处理 | 人工智能 | 机器学习

2022-12-19

Logical reasoning of text is an important ability that requires understanding the information present in the text, their interconnections, and then reasoning through them to infer new conclusions. Prior works on improving the logical reasoning ability of language models require complex processing of training data (e.g., aligning symbolic knowledge to text), yielding task-specific data augmentation solutions that restrict the learning of general logical reasoning skills. In this work, we propose APOLLO, an adaptively pretrained language model that has improved logical reasoning abilities. We select a subset of Wikipedia, based on a set of logical inference keywords, for continued pretraining of a language model. We use two self-supervised loss functions: a modified masked language modeling loss where only specific parts-of-speech words, that would likely require more reasoning than basic language understanding, are masked, and a sentence-level classification loss that teaches the model to distinguish between entailment and contradiction types of sentences. The proposed training paradigm is both simple and independent of task formats. We demonstrate the effectiveness of APOLLO by comparing it with prior baselines on two logical reasoning datasets. APOLLO performs comparably on ReClor and outperforms baselines on LogiQA.

translated by 谷歌翻译

Real-Time Deformable-Contact-Aware Model Predictive Control for Force-Modulated Manipulation

Lasitha Wijayarathne , Ziyi Zhou , Ye Zhao , Frank L. Hammond III

分类：机器人

2022-12-19

Force modulation of robotic manipulators has been extensively studied for several decades. However, it is not yet commonly used in safety-critical applications due to a lack of accurate interaction contact modeling and weak performance guarantees - a large proportion of them concerning the modulation of interaction forces. This study presents a high-level framework for simultaneous trajectory optimization and force control of the interaction between a manipulator and soft environments, which is prone to external disturbances. Sliding friction and normal contact force are taken into account. The dynamics of the soft contact model and the manipulator are simultaneously incorporated in a trajectory optimizer to generate desired motion and force profiles. A constrained optimization framework based on Alternative Direction Method of Multipliers (ADMM) has been employed to efficiently generate real-time optimal control inputs and high-dimensional state trajectories in a Model Predictive Control fashion. Experimental validation of the model performance is conducted on a soft substrate with known material properties using a Cartesian space force control mode. Results show a comparison of ground truth and real-time model-based contact force and motion tracking for multiple Cartesian motions in the valid range of the friction model. It is shown that a contact model-based motion planner can compensate for frictional forces and motion disturbances and improve the overall motion and force tracking accuracy. The proposed high-level planner has the potential to facilitate the automation of medical tasks involving the manipulation of compliant, delicate, and deformable tissues.

translated by 谷歌翻译

De-risking Carbon Capture and Sequestration with Explainable CO2 Leakage Detection in Time-lapse Seismic Monitoring Images

Huseyin Tuna Erdinc , Abhinav Prakash Gahlot , Ziyi Yin , Mathias Louboutin , Felix J. Herrmann

分类：人工智能 | 计算机视觉

2022-12-16

With the growing global deployment of carbon capture and sequestration technology to combat climate change, monitoring and detection of potential CO2 leakage through existing or storage induced faults are critical to the safe and long-term viability of the technology. Recent work on time-lapse seismic monitoring of CO2 storage has shown promising results in its ability to monitor the growth of the CO2 plume from surface recorded seismic data. However, due to the low sensitivity of seismic imaging to CO2 concentration, additional developments are required to efficiently interpret the seismic images for leakage. In this work, we introduce a binary classification of time-lapse seismic images to delineate CO2 plumes (leakage) using state-of-the-art deep learning models. Additionally, we localize the leakage region of CO2 plumes by leveraging Class Activation Mapping methods.

translated by 谷歌翻译

Unifying Human Motion Synthesis and Style Transfer with Denoising Diffusion Probabilistic Models

Ziyi Chang , Edmund J. C. Findlay , Haozheng Zhang , Hubert P. H. Shum

分类：计算机视觉 | 人工智能

2022-12-16

Generating realistic motions for digital humans is a core but challenging part of computer animations and games, as human motions are both diverse in content and rich in styles. While the latest deep learning approaches have made significant advancements in this domain, they mostly consider motion synthesis and style manipulation as two separate problems. This is mainly due to the challenge of learning both motion contents that account for the inter-class behaviour and styles that account for the intra-class behaviour effectively in a common representation. To tackle this challenge, we propose a denoising diffusion probabilistic model solution for styled motion synthesis. As diffusion models have a high capacity brought by the injection of stochasticity, we can represent both inter-class motion content and intra-class style behaviour in the same latent. This results in an integrated, end-to-end trained pipeline that facilitates the generation of optimal motion and exploration of content-style coupled latent space. To achieve high-quality results, we design a multi-task architecture of diffusion model that strategically generates aspects of human motions for local guidance. We also design adversarial and physical regulations for global guidance. We demonstrate superior performance with quantitative and qualitative results and validate the effectiveness of our multi-task architecture.

translated by 谷歌翻译